Хотим провести кластеризацию футбольных игроков
Данные, которые мы будем анализировать, представляют из себя подборку 1000 самых популярных современных фильмов (2006-2016) по версии IMDB. Для каждого фильма имеются данные о жанре, продолжительности, количестве голосов, различных вариантах рейтинга, а также информация о прибыли, которая картина принесла при показе в кинотеатрах США. Здесь совсем кратко напомним о данных (можно подробнее посмотреть в прошлом отчёте по классификации).
## [1] 19239 107
Вот так, например, выглядит строчка для Лионеля Месси:
## Linking to ImageMagick 6.9.9.39
## Enabled features: cairo, fontconfig, freetype, lcms, pango, rsvg, webp
## Disabled features: fftw, ghostscript, x11
| 1 | 3 | ||
|---|---|---|---|
| sofifa_id | 1 | 158023 | 20801 |
| player_url | 2 | https://sofifa.com/player/158023/lionel-messi/220002 | https://sofifa.com/player/20801/c-ronaldo-dos-santos-aveiro/220002 |
| short_name | 3 | L. Messi | Cristiano Ronaldo |
| long_name | 4 | Lionel Andrés Messi Cuccittini | Cristiano Ronaldo dos Santos Aveiro |
| player_positions | 5 | RW, ST, CF | ST, LW |
| overall | 6 | 93 | 91 |
| potential | 7 | 93 | 91 |
| value_eur | 8 | 78000000 | 45000000 |
| wage_eur | 9 | 320000 | 270000 |
| age | 10 | 34 | 36 |
| dob | 11 | 1987-06-24 | 1985-02-05 |
| height_cm | 12 | 170 | 187 |
| weight_kg | 13 | 72 | 83 |
| club_name | 14 | Paris Saint-Germain | Manchester United |
| league_name | 15 | French Ligue 1 | English Premier League |
| league_level | 16 | 1 | 1 |
| club_position | 17 | RW | ST |
| club_jersey_number | 18 | 30 | 7 |
| club_loaned_from | 19 | ||
| club_joined | 20 | 2021-08-10 | 2021-08-27 |
| club_contract_valid_until | 21 | 2023 | 2023 |
| nationality | 22 | Argentina | Portugal |
| nation_position | 23 | RW | ST |
| nation_jersey_number | 24 | 10 | 7 |
| preferred_foot | 25 | Left | Right |
| weak_foot | 26 | 4 | 4 |
| skill_moves | 27 | 4 | 5 |
| international_reputation | 28 | 5 | 5 |
| work_rate | 29 | Medium/Low | High/Low |
| body_type | 30 | Unique | Unique |
| real_face | 31 | Yes | Yes |
| release_clause_eur | 32 | 144300000 | 83300000 |
| player_tags | 33 | #Dribbler, #Distance Shooter, #FK Specialist, #Acrobat, #Clinical Finisher, #Complete Forward | #Aerial Threat, #Dribbler, #Distance Shooter, #Crosser, #Acrobat, #Clinical Finisher, #Complete Forward |
| player_traits | 34 | Finesse Shot, Long Shot Taker (AI), Playmaker (AI), Outside Foot Shot, One Club Player, Chip Shot (AI), Technical Dribbler (AI) | Power Free-Kick, Flair, Long Shot Taker (AI), Speed Dribbler (AI), Outside Foot Shot |
| pace | 35 | 85 | 87 |
| shooting | 36 | 92 | 94 |
| passing | 37 | 91 | 80 |
| dribbling | 38 | 95 | 87 |
| defending | 39 | 34 | 34 |
| physic | 40 | 65 | 75 |
| attacking_crossing | 41 | 85 | 87 |
| attacking_finishing | 42 | 95 | 95 |
| attacking_heading_accuracy | 43 | 70 | 90 |
| attacking_short_passing | 44 | 91 | 80 |
| attacking_volleys | 45 | 88 | 86 |
| skill_dribbling | 46 | 96 | 88 |
| skill_curve | 47 | 93 | 81 |
| skill_fk_accuracy | 48 | 94 | 84 |
| skill_long_passing | 49 | 91 | 77 |
| skill_ball_control | 50 | 96 | 88 |
| movement_acceleration | 51 | 91 | 85 |
| movement_sprint_speed | 52 | 80 | 88 |
| movement_agility | 53 | 91 | 86 |
| movement_reactions | 54 | 94 | 94 |
| movement_balance | 55 | 95 | 74 |
| power_shot_power | 56 | 86 | 94 |
| power_jumping | 57 | 68 | 95 |
| power_stamina | 58 | 72 | 77 |
| power_strength | 59 | 69 | 77 |
| power_long_shots | 60 | 94 | 93 |
| mentality_aggression | 61 | 44 | 63 |
| mentality_interceptions | 62 | 40 | 29 |
| mentality_positioning | 63 | 93 | 95 |
| mentality_vision | 64 | 95 | 76 |
| mentality_penalties | 65 | 75 | 88 |
| mentality_composure | 66 | 96 | 95 |
| defending_marking_awareness | 67 | 20 | 24 |
| defending_standing_tackle | 68 | 35 | 32 |
| defending_sliding_tackle | 69 | 24 | 24 |
| goalkeeping_diving | 70 | 6 | 7 |
| goalkeeping_handling | 71 | 11 | 11 |
| goalkeeping_kicking | 72 | 15 | 15 |
| goalkeeping_positioning | 73 | 14 | 14 |
| goalkeeping_reflexes | 74 | 8 | 11 |
| goalkeeping_speed | 75 | NA | NA |
| ls | 76 | 89+3 | 90+1 |
| st | 77 | 89+3 | 90+1 |
| rs | 78 | 89+3 | 90+1 |
| lw | 79 | 92 | 88 |
| lf | 80 | 93 | 89 |
| cf | 81 | 93 | 89 |
| rf | 82 | 93 | 89 |
| rw | 83 | 92 | 88 |
| lam | 84 | 93 | 86+3 |
| cam | 85 | 93 | 86+3 |
| ram | 86 | 93 | 86+3 |
| lm | 87 | 91+2 | 86+3 |
| lcm | 88 | 87+3 | 78+3 |
| cm | 89 | 87+3 | 78+3 |
| rcm | 90 | 87+3 | 78+3 |
| rm | 91 | 91+2 | 86+3 |
| lwb | 92 | 66+3 | 63+3 |
| ldm | 93 | 64+3 | 59+3 |
| cdm | 94 | 64+3 | 59+3 |
| rdm | 95 | 64+3 | 59+3 |
| rwb | 96 | 66+3 | 63+3 |
| lb | 97 | 61+3 | 60+3 |
| lcb | 98 | 50+3 | 53+3 |
| cb | 99 | 50+3 | 53+3 |
| rcb | 100 | 50+3 | 53+3 |
| rb | 101 | 61+3 | 60+3 |
| gk | 102 | 19+3 | 20+3 |
| player_face_url | 103 | https://cdn.sofifa.com/players/158/023/22_120.png | https://cdn.sofifa.com/players/020/801/22_120.png |
| club_logo_url | 104 | https://cdn.sofifa.com/teams/73/60.png | https://cdn.sofifa.com/teams/11/60.png |
| club_flag_url | 105 | https://cdn.sofifa.com/flags/fr.png | https://cdn.sofifa.com/flags/gb-eng.png |
| nation_logo_url | 106 | https://cdn.sofifa.com/teams/1369/60.png | https://cdn.sofifa.com/teams/1354/60.png |
| nation_flag_url | 107 | https://cdn.sofifa.com/flags/ar.png | https://cdn.sofifa.com/flags/pt.png |
Найдём всех и посмотрим на табличку. Эти игроки нам пригодятся для того, чтобы увидеть, куда они попали после кластеризации. Ещё уберём тех игроков, для которых нет информации о зарплате (у них не фиксирован клуб и лига, это понадобится позже).
nashi_parni <- which(players_full$nationality == "Russia" & players_full$value_eur != 0)
url_nashi <- as.character(players_full$player_url[nashi_parni])
cbind(players_full[nashi_parni, c(3,14,15,17,6,7,8, 9)], "") %>% kbl() %>% kable_paper("hover", full_width = F, position = "left") %>%
column_spec(6, color = "black", background = spec_color(players_full[nashi_parni, 6])) %>%
column_spec(7, color = "black", background = spec_color(players_full[nashi_parni, 7])) %>%
column_spec(8, color = "black", background = spec_color(players_full[nashi_parni, 8])) %>%
column_spec(9, color = "black", background = spec_color(players_full[nashi_parni, 9])) %>%
column_spec(4, color = spec_color(as.integer(players_full[nashi_parni, 15]))) %>%
column_spec(2, bold = T, link = url_nashi)| short_name | club_name | league_name | club_position | overall | potential | value_eur | wage_eur | "" | |
|---|---|---|---|---|---|---|---|---|---|
| 221 | Mário Fernandes | PFC CSKA Moscow | Russian Premier League | RB | 82 | 82 | 26500000 | 57000 | |
| 391 | I. Akinfeev | PFC CSKA Moscow | Russian Premier League | GK | 80 | 80 | 2300000 | 26000 | |
| 617 | A. Golovin | AS Monaco | French Ligue 1 | LF | 79 | 83 | 24500000 | 53000 | |
| 759 | R. Zobnin | Spartak Moskva | Russian Premier League | RDM | 78 | 80 | 15000000 | 51000 | |
| 766 | A. Miranchuk | Atalanta | Italian Serie A | SUB | 78 | 80 | 17500000 | 46000 | |
| 807 | A. Lunev | Bayer 04 Leverkusen | German 1. Bundesliga | SUB | 78 | 79 | 11000000 | 39000 | |
| 882 | Guilherme | FC Lokomotiv Moscow | Russian Premier League | GK | 77 | 77 | 1200000 | 22000 | |
| 1034 | G. Dzhikiya | Spartak Moskva | Russian Premier League | LCB | 77 | 79 | 11000000 | 48000 | |
| 1180 | F. Smolov | FC Lokomotiv Moscow | Russian Premier League | RS | 76 | 76 | 6500000 | 47000 | |
| 1181 | A. Dzagoev | PFC CSKA Moscow | Russian Premier League | CAM | 76 | 76 | 6000000 | 40000 | |
| 1251 | D. Cheryshev | Valencia CF | Spain Primera Division | LM | 76 | 76 | 7000000 | 31000 | |
| 1325 | A. Miranchuk | FC Lokomotiv Moscow | Russian Premier League | SUB | 76 | 79 | 10000000 | 42000 | |
| 1561 | A. Kokorin | Fiorentina | Italian Serie A | SUB | 75 | 75 | 5500000 | 48000 | |
| 1806 | D. Barinov | FC Lokomotiv Moscow | Russian Premier League | RCM | 75 | 82 | 10500000 | 33000 | |
| 1890 | A. Sobolev | Spartak Moskva | Russian Premier League | ST | 75 | 80 | 8500000 | 45000 | |
| 2032 | G. Schennikov | PFC CSKA Moscow | Russian Premier League | SUB | 74 | 74 | 3600000 | 33000 | |
| 2283 | Z. Bakaev | Spartak Moskva | Russian Premier League | SUB | 74 | 78 | 6000000 | 41000 | |
| 2306 | R. Zhemaletdinov | FC Lokomotiv Moscow | Russian Premier League | RM | 74 | 78 | 6000000 | 33000 | |
| 2307 | A. Maksimenko | Spartak Moskva | Russian Premier League | GK | 74 | 81 | 7000000 | 26000 | |
| 2320 | F. Chalov | PFC CSKA Moscow | Russian Premier League | ST | 74 | 80 | 6500000 | 34000 | |
| 2932 | I. Oblyakov | PFC CSKA Moscow | Russian Premier League | LB | 73 | 80 | 6000000 | 29000 | |
| 3039 | I. Diveev | PFC CSKA Moscow | Russian Premier League | RCB | 73 | 82 | 6500000 | 21000 | |
| 3166 | V. Vasin | PFC CSKA Moscow | Russian Premier League | SUB | 72 | 72 | 1500000 | 26000 | |
| 3386 | A. Selikhov | Spartak Moskva | Russian Premier League | SUB | 72 | 74 | 2100000 | 26000 | |
| 3509 | I. Akhmetov | PFC CSKA Moscow | Russian Premier League | RDM | 72 | 78 | 3700000 | 25000 | |
| 3542 | D. Zhivoglyadov | FC Lokomotiv Moscow | Russian Premier League | SUB | 72 | 72 | 2300000 | 29000 | |
| 3544 | S. Iljutcenko | Jeonbuk Hyundai Motors | Korean K League 1 | ST | 72 | 72 | 2300000 | 10000 | |
| 3694 | K. Kuchaev | PFC CSKA Moscow | Russian Premier League | RM | 72 | 79 | 4700000 | 25000 | |
| 3877 | F. Kudryashov | Antalyaspor | Turkish Süper Lig | SUB | 71 | 71 | 700000 | 11000 | |
| 3898 | K. Rausch |
|
German 2. Bundesliga | SUB | 71 | 71 | 1400000 | 8000 | |
| 4078 | I. Kutepov | Spartak Moskva | Russian Premier League | SUB | 71 | 72 | 1900000 | 28000 | |
| 4215 | R. Mirzov | Spartak Moskva | Russian Premier League | SUB | 71 | 71 | 1900000 | 32000 | |
| 4521 | N. Rasskazov | Spartak Moskva | Russian Premier League | RB | 71 | 76 | 2600000 | 24000 | |
| 4554 | S. Magkeev | FC Lokomotiv Moscow | Russian Premier League | RCB | 71 | 80 | 4000000 | 22000 | |
| 4602 | A. Rebrov | Spartak Moskva | Russian Premier League | SUB | 70 | 70 | 180000 | 13000 | |
| 4621 | K. Nababkin | PFC CSKA Moscow | Russian Premier League | SUB | 70 | 70 | 525000 | 20000 | |
| 4624 | A. Eschenko | Spartak Moskva | Russian Premier League | SUB | 70 | 70 | 350000 | 18000 | |
| 4695 | E. Prib | Fortuna Düsseldorf | German 2. Bundesliga | LDM | 70 | 70 | 1300000 | 14000 | |
| 4714 | A. Zabolotnyi | PFC CSKA Moscow | Russian Premier League | SUB | 70 | 70 | 1600000 | 24000 | |
| 5338 | D. Kulikov | FC Lokomotiv Moscow | Russian Premier League | LCM | 70 | 79 | 3300000 | 19000 | |
| 5369 | D. Rybchinskiy | FC Lokomotiv Moscow | Russian Premier League | LM | 70 | 78 | 3600000 | 21000 | |
| 5370 | N. Umyarov | Spartak Moskva | Russian Premier League | SUB | 70 | 79 | 3400000 | 18000 | |
| 5945 | A. Zhirov | SV Sandhausen | German 2. Bundesliga | LCB | 69 | 69 | 1100000 | 4000 | |
| 6282 | K. Maradishvili | FC Lokomotiv Moscow | Russian Premier League | RES | 69 | 77 | 3000000 | 13000 | |
| 6283 | P. Maslov | Spartak Moskva | Russian Premier League | RES | 69 | 78 | 3000000 | 15000 | |
| 6403 | A. Silyanov | FC Lokomotiv Moscow | Russian Premier League | RB | 69 | 78 | 2900000 | 13000 | |
| 6539 | E. Bashkirov | Zagłębie Lubin | Polish T-Mobile Ekstraklasa | RDM | 68 | 68 | 1000000 | 4000 | |
| 6812 | A. Vasyutin | Djurgårdens IF | Swedish Allsvenskan | SUB | 68 | 73 | 1400000 | 3000 | |
| 6842 | N. Haikin | FK Bodø/Glimt | Norwegian Eliteserien | GK | 68 | 73 | 1400000 | 2000 | |
| 7073 | I. Zlobin | Futebol Clube de Famalicão | Portuguese Liga ZON SAGRES | SUB | 68 | 76 | 2300000 | 3000 | |
| 7437 | M. Mukhin | PFC CSKA Moscow | Russian Premier League | LDM | 68 | 79 | 2600000 | 9000 | |
| 8255 | M. Suleymanov | GZT Giresunspor | Turkish Süper Lig | SUB | 67 | 71 | 1500000 | 5000 | |
| 9347 | I. Zhigulev | Zagłębie Lubin | Polish T-Mobile Ekstraklasa | SUB | 66 | 71 | 1200000 | 3000 | |
| 9507 | N. Tiknizyan | FC Lokomotiv Moscow | Russian Premier League | RES | 66 | 77 | 1900000 | 11000 | |
| 9518 | A. Lomovitskiy | Spartak Moskva | Russian Premier League | SUB | 66 | 74 | 1900000 | 12000 | |
| 10292 | G. Melkadze | Spartak Moskva | Russian Premier League | SUB | 65 | 70 | 1100000 | 11000 | |
| 10540 | I. Shinozuka | Kashiwa Reysol | Japanese J. League Division 1 | RES | 65 | 66 | 850000 | 3000 | |
| 10694 | M. Ignatov | Spartak Moskva | Russian Premier League | SUB | 65 | 78 | 1800000 | 9000 | |
| 10746 | I. Gaponov | Spartak Moskva | Russian Premier League | RES | 65 | 74 | 1500000 | 10000 | |
| 11038 | M. Nenakhov | FC Lokomotiv Moscow | Russian Premier League | RES | 65 | 72 | 1400000 | 9000 | |
| 11306 | A. Mitryushkin | SG Dynamo Dresden | German 2. Bundesliga | SUB | 64 | 69 | 675000 | 3000 | |
| 11764 | L. Klassen | WSG Tirol | Austrian Football Bundesliga | LB | 64 | 71 | 1100000 | 2000 | |
| 11938 | V. Karpov | PFC CSKA Moscow | Russian Premier League | RES | 64 | 79 | 1300000 | 2000 | |
| 13233 | E. Shlyakov | AFC UTA Arad | Romanian Liga I | LB | 63 | 63 | 425000 | 2000 | |
| 13240 | E. Sevikyan | Levante Unión Deportiva | Spain Primera Division | RES | 63 | 77 | 1100000 | 3000 | |
| 14148 | N. Iosifov | Villarreal CF | Spain Primera Division | RES | 62 | 75 | 950000 | 4000 | |
| 14366 | S. Babkin | FC Lokomotiv Moscow | Russian Premier League | SUB | 62 | 77 | 925000 | 3000 | |
| 14393 | V. Yakovlev | PFC CSKA Moscow | Russian Premier League | RES | 62 | 75 | 950000 | 5000 | |
| 15876 | A. Savin | FC Lokomotiv Moscow | Russian Premier League | SUB | 60 | 70 | 475000 | 3000 | |
| 16442 | Y. Mikhailov | FC Schalke 04 | German 2. Bundesliga | SUB | 59 | 76 | 575000 | 750 | |
| 16794 | V. Molchan | Stade Malherbe Caen | French Ligue 2 | RES | 58 | 67 | 425000 | 750 | |
| 16890 | V. Cherny | DSC Arminia Bielefeld | German 1. Bundesliga | SUB | 58 | 76 | 525000 | 1000 | |
| 17476 | A. Chernov | Vejle Boldklub | Danish Superliga | SUB | 56 | 66 | 275000 | 1000 | |
| 17597 | D. Markitesov | Spartak Moskva | Russian Premier League | RES | 56 | 73 | 375000 | 6000 | |
| 17724 | A. Thomas | Seattle Sounders FC | USA Major League Soccer | RES | 56 | 63 | 250000 | 850 | |
| 17966 | D. Bokov | PFC CSKA Moscow | Russian Premier League | SUB | 55 | 74 | 275000 | 500 | |
| 18109 | D. Khudyakov | FC Lokomotiv Moscow | Russian Premier League | SUB | 55 | 75 | 300000 | 500 | |
| 18434 | T. Akmurzin | Spartak Moskva | Russian Premier League | RES | 53 | 63 | 180000 | 4000 | |
| 18487 | V. Torop | PFC CSKA Moscow | Russian Premier League | SUB | 53 | 75 | 275000 | 500 | |
| 18683 | A. Poplevchenkov | Spartak Moskva | Russian Premier League | RES | 52 | 66 | 170000 | 3000 | |
| 18853 | I. Repyakh | Vejle Boldklub | Danish Superliga | RES | 52 | 66 | 190000 | 1000 |
Русских столько: 81.
Возьмём 50 самых лучших по оценке overall в FIFA футболистов (первые 50 строк)
top_world <- 1:50
url_world <- as.character(players_full$player_url[top_world])
cbind(1:50, players_full[top_world, c(3,14,15,17,6,7,8,9)]) %>% kbl() %>% kable_paper("hover", full_width = F, position = "left") %>%
column_spec(6, color = "black", background = spec_color(players_full[top_world, 6])) %>%
column_spec(7, color = "black", background = spec_color(players_full[top_world, 7])) %>%
column_spec(8, color = "black", background = spec_color(players_full[top_world, 8])) %>%
column_spec(9, color = "black", background = spec_color(players_full[top_world, 9])) %>%
column_spec(4, color = spec_color(as.integer(players_full[top_world, 15]))) %>%
column_spec(2, bold = T, link = url_world)| 1:50 | short_name | club_name | league_name | club_position | overall | potential | value_eur | wage_eur |
|---|---|---|---|---|---|---|---|---|
| 1 | L. Messi | Paris Saint-Germain | French Ligue 1 | RW | 93 | 93 | 78000000 | 320000 |
| 2 | R. Lewandowski | FC Bayern München | German 1. Bundesliga | ST | 92 | 92 | 119500000 | 270000 |
| 3 | Cristiano Ronaldo | Manchester United | English Premier League | ST | 91 | 91 | 45000000 | 270000 |
| 4 | Neymar Jr | Paris Saint-Germain | French Ligue 1 | LW | 91 | 91 | 129000000 | 270000 |
| 5 | K. De Bruyne | Manchester City | English Premier League | RCM | 91 | 91 | 125500000 | 350000 |
| 6 | J. Oblak | Atlético de Madrid | Spain Primera Division | GK | 91 | 93 | 112000000 | 130000 |
| 7 | K. Mbappé | Paris Saint-Germain | French Ligue 1 | ST | 91 | 95 | 194000000 | 230000 |
| 8 | M. Neuer | FC Bayern München | German 1. Bundesliga | GK | 90 | 90 | 13500000 | 86000 |
| 9 | M. ter Stegen | FC Barcelona | Spain Primera Division | GK | 90 | 92 | 99000000 | 250000 |
| 10 | H. Kane | Tottenham Hotspur | English Premier League | ST | 90 | 90 | 129500000 | 240000 |
| 11 | N. Kanté | Chelsea | English Premier League | RCM | 90 | 90 | 100000000 | 230000 |
| 12 | K. Benzema | Real Madrid CF | Spain Primera Division | CF | 89 | 89 | 66000000 | 350000 |
| 13 | T. Courtois | Real Madrid CF | Spain Primera Division | GK | 89 | 91 | 85500000 | 250000 |
| 14 | H. Son | Tottenham Hotspur | English Premier League | LW | 89 | 89 | 104000000 | 220000 |
| 15 | Casemiro | Real Madrid CF | Spain Primera Division | CDM | 89 | 89 | 88000000 | 310000 |
| 16 | V. van Dijk | Liverpool | English Premier League | LCB | 89 | 89 | 86000000 | 230000 |
| 17 | S. Mané | Liverpool | English Premier League | LW | 89 | 89 | 101000000 | 270000 |
| 18 | M. Salah | Liverpool | English Premier League | RW | 89 | 89 | 101000000 | 270000 |
| 19 | Ederson | Manchester City | English Premier League | GK | 89 | 91 | 94000000 | 200000 |
| 20 | J. Kimmich | FC Bayern München | German 1. Bundesliga | RDM | 89 | 90 | 108000000 | 160000 |
| 21 | Alisson | Liverpool | English Premier League | GK | 89 | 90 | 82000000 | 190000 |
| 22 | G. Donnarumma | Paris Saint-Germain | French Ligue 1 | GK | 89 | 93 | 119500000 | 110000 |
| 23 | Sergio Ramos | Paris Saint-Germain | French Ligue 1 | LCB | 88 | 88 | 24000000 | 115000 |
| 24 | L. Suárez | Atlético de Madrid | Spain Primera Division | RS | 88 | 88 | 44500000 | 135000 |
| 25 | T. Kroos | Real Madrid CF | Spain Primera Division | LCM | 88 | 88 | 75000000 | 310000 |
| 26 | R. Lukaku | Chelsea | English Premier League | ST | 88 | 88 | 93500000 | 260000 |
| 27 | K. Navas | Paris Saint-Germain | French Ligue 1 | SUB | 88 | 88 | 15500000 | 130000 |
| 28 | R. Sterling | Manchester City | English Premier League | SUB | 88 | 89 | 107500000 | 290000 |
| 29 | Bruno Fernandes | Manchester United | English Premier League | CAM | 88 | 89 | 107500000 | 250000 |
| 30 | E. Haaland | Borussia Dortmund | German 1. Bundesliga | RS | 88 | 93 | 137500000 | 110000 |
| 31 | S. Agüero | FC Barcelona | Spain Primera Division | ST | 87 | 87 | 51000000 | 260000 |
| 32 | H. Lloris | Tottenham Hotspur | English Premier League | GK | 87 | 87 | 13500000 | 125000 |
| 33 | L. Modrić | Real Madrid CF | Spain Primera Division | RCM | 87 | 87 | 32000000 | 190000 |
| 34 | A. Di María | Paris Saint-Germain | French Ligue 1 | SUB | 87 | 87 | 49500000 | 160000 |
| 35 | W. Szczęsny | Juventus | Italian Serie A | GK | 87 | 87 | 42000000 | 105000 |
| 36 | T. Müller | FC Bayern München | German 1. Bundesliga | CAM | 87 | 87 | 66000000 | 140000 |
| 37 | C. Immobile | Lazio | Italian Serie A | ST | 87 | 87 | 67500000 | 125000 |
| 38 | P. Pogba | Manchester United | English Premier League | RDM | 87 | 87 | 79500000 | 220000 |
| 39 | M. Verratti | Paris Saint-Germain | French Ligue 1 | LCM | 87 | 87 | 79500000 | 155000 |
| 40 | Marquinhos | Paris Saint-Germain | French Ligue 1 | RCB | 87 | 90 | 90500000 | 135000 |
| 41 | L. Goretzka | FC Bayern München | German 1. Bundesliga | LDM | 87 | 88 | 93000000 | 140000 |
| 42 | P. Dybala | Juventus | Italian Serie A | CAM | 87 | 88 | 93000000 | 160000 |
| 43 | A. Robertson | Liverpool | English Premier League | LB | 87 | 88 | 83500000 | 175000 |
| 44 | F. de Jong | FC Barcelona | Spain Primera Division | RCM | 87 | 92 | 119500000 | 210000 |
| 45 | T. Alexander-Arnold | Liverpool | English Premier League | RB | 87 | 92 | 114000000 | 150000 |
| 46 | J. Sancho | Manchester United | English Premier League | LM | 87 | 91 | 116500000 | 150000 |
| 47 | Rúben Dias | Manchester City | English Premier League | RCB | 87 | 91 | 102500000 | 170000 |
| 48 | G. Chiellini | Juventus | Italian Serie A | SUB | 86 | 86 | 12000000 | 88000 |
| 49 | S. Handanovič | Inter | Italian Serie A | GK | 86 | 86 | 7500000 | 78000 |
| 50 | M. Hummels | Borussia Dortmund | German 1. Bundesliga | LCB | 86 | 86 | 44000000 | 95000 |
Позиций в футболе достаточно много, особенно если рассматривать в классификации, которая дана здесь
## [1] RW ST LW RCM GK CF CDM LCB RDM RS LCM SUB CAM RCB LDM LB RB LM RM
## [20] LS CB RES RWB RF CM LWB LAM LF RAM
## 30 Levels: CAM CB CDM CF CM GK LAM LB LCB LCM LDM LF LM LS LW LWB RAM ... SUB
R и L — right и left, F и B — forward и back, C — center, S - striker
Если не учитывать голкиперов, то обычно мы говорим о защите, полузащите и нападении. В данном датасете присутствуют характеристики, которые потенциально могут помочь в определении предположительной позиции.
Давайте посмотрим, о каких характеристиках идёт речь:
## [1] "pace" "shooting"
## [3] "passing" "dribbling"
## [5] "defending" "physic"
## [7] "attacking_crossing" "attacking_finishing"
## [9] "attacking_heading_accuracy" "attacking_short_passing"
## [11] "attacking_volleys" "skill_dribbling"
## [13] "skill_curve" "skill_fk_accuracy"
## [15] "skill_long_passing" "skill_ball_control"
## [17] "movement_acceleration" "movement_sprint_speed"
## [19] "movement_agility" "movement_reactions"
## [21] "movement_balance" "power_shot_power"
## [23] "power_jumping" "power_stamina"
## [25] "power_strength" "power_long_shots"
## [27] "mentality_aggression" "mentality_interceptions"
## [29] "mentality_positioning" "mentality_vision"
## [31] "mentality_penalties" "mentality_composure"
## [33] "defending_marking_awareness" "defending_standing_tackle"
## [35] "defending_sliding_tackle" "goalkeeping_diving"
## [37] "goalkeeping_handling" "goalkeeping_kicking"
## [39] "goalkeeping_positioning" "goalkeeping_reflexes"
Кажется, что характеристики должны хорошо различать атакующих игроков от игроков защиты и полузащиты. Полузащиту в данном случае можно воспринимать как универсальных игроков. Здесь нет намёка на правый/левый фланг и правша/левша, поэтому надеемся, что этот фактор не будет различать кластеры.
Для того, чтобы кластеризация не пошла по возрасту/потенциалу/общему уровню игры, эти признаки мы тоже не включаем.
Так как некоторые характеристик для голкиперов отсутствуют, да и явно есть отличие между вратарями и полевыми игроками, мы изымем их из рассмотрения. Характеристики, которые начинаются с “goalkeeping” мы оставим, они могут помочь различать защитников.
## [1] FALSE FALSE FALSE FALSE FALSE TRUE
## [1] 2132
Не так их и много.
Ещё одна проблема заключается в том, что выборка большая и может включать в себя неоднородности, которые хотелось бы избежать. Например, в низших лигах границы между игроками могут быть размыты сильнее. Посмотрим, сколько игроков останется, если оставим только игроков команд высших лиг.
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
## [1] 14857
Также, по этим признакам у нас не должно быть NA, уберём их позже, их немного.
Таким образом, остаётся столько футболистов:
## [1] 13193
Признаков много, поэтому проведём минимальный анализ. Сделаем два датафрейма, один с интересующими нас признаками, другой — с общей информацией об игроке, чтобы потом удобно было анализировать результат. NA уберём, как обещали.
players <- players_full[top_leagues & !goalkeepers, skills_vars]
players <- na.omit(players)
dim(players)## [1] 13193 40
players_info <- players_full[top_leagues & !goalkeepers, c(3,6,7,8,9,14,17,15,22,2)]
players_info <- na.omit(players_info)
dim(players_info)## [1] 13193 10
## pace shooting passing dribbling defending
## Min. :28.00 Min. :18.0 Min. :25.00 Min. :26.00 Min. :15.00
## 1st Qu.:62.00 1st Qu.:42.0 1st Qu.:51.00 1st Qu.:57.00 1st Qu.:38.00
## Median :69.00 Median :55.0 Median :58.00 Median :64.00 Median :56.00
## Mean :68.33 Mean :52.8 Mean :57.88 Mean :63.03 Mean :52.03
## 3rd Qu.:76.00 3rd Qu.:64.0 3rd Qu.:65.00 3rd Qu.:70.00 3rd Qu.:65.00
## Max. :97.00 Max. :94.0 Max. :93.00 Max. :95.00 Max. :91.00
## physic attacking_crossing attacking_finishing
## Min. :29.00 Min. :15.00 Min. :10.00
## 1st Qu.:59.00 1st Qu.:45.00 1st Qu.:37.00
## Median :66.00 Median :56.00 Median :53.00
## Mean :64.89 Mean :54.56 Mean :50.65
## 3rd Qu.:72.00 3rd Qu.:65.00 3rd Qu.:64.00
## Max. :90.00 Max. :94.00 Max. :95.00
## attacking_heading_accuracy attacking_short_passing attacking_volleys
## Min. :17.00 Min. :23.00 Min. :10.00
## 1st Qu.:48.00 1st Qu.:58.00 1st Qu.:35.00
## Median :57.00 Median :64.00 Median :47.00
## Mean :56.75 Mean :63.41 Mean :46.89
## 3rd Qu.:65.00 3rd Qu.:70.00 3rd Qu.:58.00
## Max. :93.00 Max. :94.00 Max. :90.00
## skill_dribbling skill_curve skill_fk_accuracy skill_long_passing
## Min. :18.0 Min. :12.00 Min. :10.00 Min. :20.00
## 1st Qu.:55.0 1st Qu.:40.00 1st Qu.:34.00 1st Qu.:50.00
## Median :63.0 Median :52.00 Median :44.00 Median :59.00
## Mean :61.4 Mean :51.93 Mean :46.27 Mean :57.09
## 3rd Qu.:70.0 3rd Qu.:64.00 3rd Qu.:58.00 3rd Qu.:66.00
## Max. :96.0 Max. :94.00 Max. :94.00 Max. :93.00
## skill_ball_control movement_acceleration movement_sprint_speed
## Min. :24.00 Min. :27.00 Min. :27.00
## 1st Qu.:58.00 1st Qu.:62.00 1st Qu.:63.00
## Median :65.00 Median :69.00 Median :69.00
## Mean :63.91 Mean :68.29 Mean :68.34
## 3rd Qu.:70.00 3rd Qu.:76.00 3rd Qu.:76.00
## Max. :96.00 Max. :97.00 Max. :97.00
## movement_agility movement_reactions movement_balance power_shot_power
## Min. :27.00 Min. :29.00 Min. :26.00 Min. :20.00
## 1st Qu.:59.00 1st Qu.:56.00 1st Qu.:60.00 1st Qu.:51.00
## Median :68.00 Median :62.00 Median :68.00 Median :61.00
## Mean :66.68 Mean :62.36 Mean :66.94 Mean :59.65
## 3rd Qu.:75.00 3rd Qu.:68.00 3rd Qu.:75.00 3rd Qu.:70.00
## Max. :96.00 Max. :94.00 Max. :96.00 Max. :95.00
## power_jumping power_stamina power_strength power_long_shots
## Min. :29.00 Min. :24.0 Min. :19.00 Min. :11.00
## 1st Qu.:58.00 1st Qu.:61.0 1st Qu.:58.00 1st Qu.:40.00
## Median :66.00 Median :68.0 Median :67.00 Median :54.00
## Mean :65.77 Mean :67.4 Mean :65.59 Mean :51.59
## 3rd Qu.:74.00 3rd Qu.:75.0 3rd Qu.:74.00 3rd Qu.:64.00
## Max. :95.00 Max. :97.0 Max. :96.00 Max. :94.00
## mentality_aggression mentality_interceptions mentality_positioning
## Min. :20.00 Min. :10.00 Min. :12.00
## 1st Qu.:50.00 1st Qu.:35.00 1st Qu.:48.00
## Median :61.00 Median :56.00 Median :58.00
## Mean :59.65 Mean :50.92 Mean :55.88
## 3rd Qu.:70.00 3rd Qu.:65.00 3rd Qu.:66.00
## Max. :95.00 Max. :91.00 Max. :96.00
## mentality_vision mentality_penalties mentality_composure
## Min. :13.00 Min. :13.00 Min. :30.00
## 1st Qu.:48.00 1st Qu.:42.00 1st Qu.:53.00
## Median :58.00 Median :51.00 Median :61.00
## Mean :56.35 Mean :51.85 Mean :60.57
## 3rd Qu.:66.00 3rd Qu.:61.00 3rd Qu.:68.00
## Max. :95.00 Max. :93.00 Max. :96.00
## defending_marking_awareness defending_standing_tackle defending_sliding_tackle
## Min. :10.00 Min. :10.00 Min. :10.00
## 1st Qu.:37.00 1st Qu.:37.00 1st Qu.:34.00
## Median :55.00 Median :59.00 Median :56.00
## Mean :51.04 Mean :52.63 Mean :50.22
## 3rd Qu.:65.00 3rd Qu.:67.00 3rd Qu.:65.00
## Max. :93.00 Max. :93.00 Max. :92.00
## goalkeeping_diving goalkeeping_handling goalkeeping_kicking
## Min. : 2.00 Min. : 2.00 Min. : 2.00
## 1st Qu.: 8.00 1st Qu.: 8.00 1st Qu.: 8.00
## Median :10.00 Median :10.00 Median :10.00
## Mean :10.34 Mean :10.37 Mean :10.38
## 3rd Qu.:13.00 3rd Qu.:13.00 3rd Qu.:13.00
## Max. :29.00 Max. :33.00 Max. :31.00
## goalkeeping_positioning goalkeeping_reflexes
## Min. : 2.00 Min. : 2.00
## 1st Qu.: 8.00 1st Qu.: 8.00
## Median :10.00 Median :10.00
## Mean :10.37 Mean :10.33
## 3rd Qu.:13.00 3rd Qu.:13.00
## Max. :33.00 Max. :37.00
## Loading required package: viridisLite
## No id variables; using all as measure variables
p <- ggplot(data = players_m, aes(y=variable, x=value, fill = variable, alpha = 0.7)) +
geom_boxplot() + geom_violin() + scale_fill_manual(values = viridis(40)) + guides(fill = "none")
p Как можно видеть, многие из приведённых графиков бимодальны, например, defending и attacking, что может быть хорошим знаком того, что кластеризация у нас получится (и может даже в нормальной модели)
Попробуем сократить размерность пространства признаков и посмотрим на biplot.
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
res$var$coord %>% kbl() %>% kable_paper("hover", full_width = F, position = "left") %>%
column_spec(2, color = "black", background = spec_color(abs(res$var$coord[,1]))) %>%
column_spec(3, color = "black", background = spec_color(abs(res$var$coord[,2])))| Dim.1 | Dim.2 | Dim.3 | Dim.4 | Dim.5 | |
|---|---|---|---|---|---|
| pace | 0.4803675 | -0.3412889 | 0.5521689 | 0.5362830 | -0.0038420 |
| shooting | 0.9031861 | -0.1770167 | -0.3129565 | 0.0110116 | -0.0063931 |
| passing | 0.8833757 | 0.3275755 | 0.1276242 | -0.2264294 | -0.0046164 |
| dribbling | 0.9432781 | 0.0044614 | 0.1485479 | -0.0024266 | -0.0286186 |
| defending | -0.1751087 | 0.9237815 | 0.2755775 | -0.0831520 | -0.0009376 |
| physic | 0.0837918 | 0.7716509 | -0.3182989 | 0.4479501 | -0.0157178 |
| attacking_crossing | 0.7481943 | 0.1448143 | 0.3218917 | -0.1297755 | 0.0435817 |
| attacking_finishing | 0.8358671 | -0.2995539 | -0.3078979 | 0.0439884 | -0.0070833 |
| attacking_heading_accuracy | 0.0475299 | 0.5399340 | -0.5331219 | 0.3765858 | -0.0575140 |
| attacking_short_passing | 0.7377009 | 0.4873940 | 0.0389338 | -0.1629982 | -0.0458902 |
| attacking_volleys | 0.8232193 | -0.1696750 | -0.3152016 | 0.0049402 | -0.0004373 |
| skill_dribbling | 0.9111194 | -0.0528693 | 0.1094648 | -0.0209284 | -0.0343958 |
| skill_curve | 0.8575685 | 0.0360943 | 0.0376135 | -0.1682756 | 0.0177360 |
| skill_fk_accuracy | 0.7556856 | 0.0770102 | -0.0303190 | -0.3009694 | 0.0483341 |
| skill_long_passing | 0.6035401 | 0.5489778 | 0.1500831 | -0.3025632 | -0.0211363 |
| skill_ball_control | 0.8846212 | 0.2003612 | -0.0024331 | -0.0373661 | -0.0491033 |
| movement_acceleration | 0.5004925 | -0.3598370 | 0.5712761 | 0.4488248 | 0.0042207 |
| movement_sprint_speed | 0.4347063 | -0.3050335 | 0.5026461 | 0.5759626 | -0.0102871 |
| movement_agility | 0.6814319 | -0.2551427 | 0.4719116 | 0.1567143 | 0.0412622 |
| movement_reactions | 0.6286497 | 0.5471269 | -0.1215489 | 0.1392700 | -0.0321509 |
| movement_balance | 0.4869166 | -0.2736985 | 0.5580139 | -0.0628797 | 0.0471450 |
| power_shot_power | 0.8115880 | 0.0725762 | -0.3239688 | 0.0168331 | -0.0231424 |
| power_jumping | -0.0051228 | 0.3817842 | -0.1027907 | 0.5724873 | 0.0264658 |
| power_stamina | 0.3800651 | 0.4967675 | 0.1686971 | 0.3596492 | 0.0260374 |
| power_strength | -0.0745845 | 0.5887408 | -0.5305355 | 0.4068123 | -0.0367999 |
| power_long_shots | 0.8776214 | -0.0496767 | -0.2322783 | -0.0732929 | 0.0073395 |
| mentality_aggression | 0.0855554 | 0.8016419 | -0.0621234 | 0.1722462 | -0.0042107 |
| mentality_interceptions | -0.1404271 | 0.8940623 | 0.2938862 | -0.1041136 | 0.0095903 |
| mentality_positioning | 0.8732683 | -0.1566944 | -0.1149013 | 0.0448532 | 0.0034783 |
| mentality_vision | 0.8774478 | 0.1018037 | 0.0072120 | -0.2056271 | -0.0093744 |
| mentality_penalties | 0.7206516 | -0.1555862 | -0.3966560 | -0.0286951 | 0.0007707 |
| mentality_composure | 0.7114847 | 0.4449490 | -0.1567992 | 0.0252829 | -0.0340337 |
| defending_marking_awareness | -0.1584522 | 0.8846805 | 0.2890390 | -0.0946016 | 0.0025191 |
| defending_standing_tackle | -0.2008635 | 0.8744922 | 0.3326012 | -0.1256241 | 0.0000211 |
| defending_sliding_tackle | -0.2363179 | 0.8515204 | 0.3566346 | -0.1167522 | 0.0015928 |
| goalkeeping_diving | 0.0212048 | 0.0504192 | -0.0397377 | 0.0065109 | 0.4959255 |
| goalkeeping_handling | 0.0299829 | 0.0505023 | -0.0453224 | 0.0122532 | 0.4947078 |
| goalkeeping_kicking | 0.0537447 | 0.0480149 | -0.0456355 | 0.0369209 | 0.4912717 |
| goalkeeping_positioning | 0.0364707 | 0.0570117 | -0.0684556 | 0.0035093 | 0.4429337 |
| goalkeeping_reflexes | 0.0412314 | 0.0430386 | -0.0506463 | 0.0101047 | 0.5012007 |
Первый фактор, по всей видимости, характеризует атакующую игру, а второй — защиту.
Также построим biplot и найдём некоторых игроков, чтобы интерпретировать полученный результат.
bestest <- c(1,4,3,5,29,15,23,16,61,47,56,115)
cbind(bestest, as.character(players_info$short_name[bestest]), as.character(players_info$club_position[bestest])) %>% kbl() %>% kable_paper("hover", full_width = F, position = "left")| bestest | ||
|---|---|---|
| 1 | L. Messi | RW |
| 4 | Neymar Jr | LW |
| 3 | Cristiano Ronaldo | ST |
| 5 | K. De Bruyne | RCM |
| 29 | M. Verratti | LCM |
| 15 | J. Kimmich | RDM |
| 23 | S. Agüero | ST |
| 16 | Sergio Ramos | LCB |
| 61 | E. Cavani | SUB |
| 47 | R. Mahrez | RW |
| 56 | Rodri | CDM |
| 115 | L. Sané | LM |
Нельзя сказать, что получилось однозначно (из-за полузащиты). С Месси (1), Неймаром (4) и Де Брюйне (5, атакующий полузащитник) всё логично, они в атаке.
Махрез (47) сейчас полузащитник-вингер, однако помимо подключений к атакам, от этих полузащитников требуется защита их игровых зон от проходов крайних защитников и опорных соперника. Киммих (15) тоже полузащитник. А вот Агуэро (23), вообще говоря, нападающий. Родри (56) - опорный полузащитник, поэтому немного странно, что он там, где есть. Впрочем, общая логика всё же присутствует.
Кстати, нумерация идёт по общему рейтингу, хорошо видим, что слева внизу индексы большие.
Отчётливо видим, что есть облака точек; можно увидеть два крупных или три поменьше.
library(kohonen)
set.seed(56788)
data_matrix <- as.matrix(scale(players))
# Create the SOM Grid - you generally have to specify the size of the
# training grid prior to training the SOM. Hexagonal and Circular
# topologies are possible
som_grid <- somgrid(xdim = 20, ydim=20, topo="hexagonal")
# Finally, train the SOM, options for the number of iterations,
# the learning rates, and the neighbourhood are available
som_model <- som(data_matrix,
grid=som_grid,
rlen=200,
alpha=c(0.05,0.01),
keep.data = TRUE)Это хорошо, что все игроки распределились по ячейкам равномерно. Может быть нужно увеличить карту, чтобы получилось поменьше индивидов на ячейку, но пока оставим так.
coolBlueHotRed <- function(n, alpha = 1) {rainbow(n, end=4/6, alpha=alpha)[n:1]}
som.hc <- cutree(hclust(object.distances(som_model, "codes")), 2)
#pdf("heatmapkoh")
par(mfrow = c(2,3))
plot(som_model, type = "property",
property = getCodes(som_model)[, 1], main = names(players)[1], palette.name = coolBlueHotRed)
add.cluster.boundaries(som_model, som.hc, lwd = 4)
plot(som_model, type = "property",
property = getCodes(som_model)[, 2], main = names(players)[2], palette.name = coolBlueHotRed)
add.cluster.boundaries(som_model, som.hc, lwd = 4)
plot(som_model, type = "property",
property = getCodes(som_model)[, 3], main = names(players)[3], palette.name = coolBlueHotRed)
add.cluster.boundaries(som_model, som.hc, lwd = 4)
plot(som_model, type = "property",
property = getCodes(som_model)[, 4], main = names(players)[4], palette.name = coolBlueHotRed)
add.cluster.boundaries(som_model, som.hc, lwd = 4)
plot(som_model, type = "property",
property = getCodes(som_model)[, 5], main = names(players)[5], palette.name = coolBlueHotRed)
add.cluster.boundaries(som_model, som.hc, lwd = 4)
plot(som_model, type = "property",
property = getCodes(som_model)[, 6], main = names(players)[6], palette.name = coolBlueHotRed)
add.cluster.boundaries(som_model, som.hc, lwd = 4)% TODO:: записать некоторые объяснения того, что мы видим
## Warning in abbreviate(players_info$short_name[1:100], 10): abbreviate used with
## non-ASCII chars
labs_som[players_info$nationality == "Russia"] = as.character(players_info$short_name[players_info$nationality == "Russia"])
pdf()
plot(som_model, type = "mapping", labels = labs_som, cex = 0.3)## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <c5>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <a0>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0160
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <c5>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <a0>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <9f>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+011f
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <9f>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <8d>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+010d
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <8d>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <c5>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <a0>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0160
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <c5>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <a0>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <9f>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+011f
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <8d>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+010d
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <c5>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <a0>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0160
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <c5>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <a0>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <9f>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+011f
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <8d>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+010d
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <87>
mclustТак как метод встречался ранее, пройдёмся по нему быстро
Будем использовать известную библиотеку mclust(), которая строит сразу множество вариантов моделей кластеризации.
## Package 'mclust' version 5.4.5
## Type 'citation("mclust")' for citing this R package in publications.
##
## Attaching package: 'mclust'
## The following object is masked from 'package:kohonen':
##
## map
Здесь посмотрим на тот выбор, который делает функция mclust(). Этот выбор основывается на посчитанных характеристиках качества модели BIC и ICL. Так как нам известно, что значения BIC и ICL есть случайные числа, можно посмотреть какие ещё варианты кластеризации близки по этим значениям.
Метод выбрал разбиение на 9 кластеров, но это много для нас (убрал эту часть для скорости).
Наш выбор основан на совокупности факторов:
Значение байесовского информационного критерия модели (BIC) (больше — лучше). Заметим, что значение BIC есть случайная величина, а значит будет весьма осмысленным рассмотреть несколько моделей с похожим BIC и разным числом кластеров и параметров.
Число кластеров и число оцениваемых параметров Заметим здесь, что при сопоставимом значении BIC будем выбирать наиболее простую модель с наименьшим числом оцениваемых параметров, так как чем меньше параметров приходится оценивать, тем меньше будет дисперсия соответствующих оценок при фиксированном размере выборки. Мы хотели бы получить до четырёх кластеров.
Посмотрим на все BIC, построим график.
playersBIC <- mclustBIC(players, G = 1:4)
bic_table <- playersBIC[,]
colnames(bic_table) <- colnames(playersBIC)
rownames(bic_table) <- 1:nrow(bic_table)| EII | VII | EEI | VEI | EVI | VVI | EEE | EVE | VEE | VVE | EEV | VEV | EVV | VVV |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| -4176528 | -4176528 | -4009780 | -4009780 | -4009780 | -4009780 | -2971174 | -2971174 | -2971174 | -2971174 | -2971174 | -2971174 | -2971174 | -2971174 |
| -4020284 | -4014455 | -3862532 | -3862244 | -3855064 | -3851498 | -2967176 | -2943646 | -2964013 | -2943784 | -2928977 | -2928974 | -2927618 | -2927619 |
| -3889482 | -3885953 | -3779774 | -3777249 | -3763502 | -3767724 | -2957998 | -2931667 | -2954104 | -2929064 | -2922661 | -2920979 | -2921040 | -2918752 |
| -3836340 | -3835749 | -3727901 | -3727359 | -3709331 | -3709088 | -2951693 | -2927433 | -2948048 | -2923912 | -2921751 | -2920219 | -2919935 | -2917594 |
Выбрали модель VVV, 3
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VVV (ellipsoidal, varying volume, shape, and orientation) model with 3
## components:
##
## log-likelihood n df BIC ICL
## -1447144 13193 2582 -2918785 -2919823
##
## Clustering table:
## 1 2 3
## 4205 3721 5267
На основании наших предположений и графика с главными компонентами можем именовать классы.
classes_mclust <- as.factor(players_mclust$classification)
table(players_mclust$classification)/num_players##
## 1 2 3
## 0.3187296 0.2820435 0.3992269
levels(classes_mclust) <- c("Midfielder","Defence","Attack")
fviz_pca_biplot(res,
label = "all",
col.ind = classes_mclust,
legend.title = "Players")Качество кластеров можно оценить с помощью меры uncertainty, которая вычисляется так: из единицы вычитается вероятность наиболее вероятного класса. Это весьма неплохо показывает, насколько классы пересекаются. Посмотрим на различные квантили
## 60% 70% 80% 90% 95% 97.5%
## 0.0002221314 0.0022168335 0.0170623789 0.1055412775 0.2560235155 0.3721556604
## 99% 99.5%
## 0.4529252851 0.4772857173
Чтобы можно было сравнивать методы, посчитаем within SS/between SS
# Subtract each value from the grand mean and get the number of observations in each cluster.
data.cent <- scale(players, scale=FALSE)
nrows <- table(classes_mclust)
TSS <- sum(data.cent^2)
WSS <- sapply(split(players, classes_mclust), function(x) sum(scale(x, scale=FALSE)^2))
BSS <- TSS - sum(WSS)
gmeans <- sapply(split(players, classes_mclust), colMeans)
means <- colMeans(players)
BSS <- sum(colSums((gmeans - means)^2) * nrows)
BSS/TSS## [1] 0.3897062
Посмотрим на отдельныx игроков в таблице:
| rate | name | position | mclust |
|---|---|---|---|
| 1 | L. Messi | RW | Attack |
| 3 | Cristiano Ronaldo | ST | Attack |
| 4 | Neymar Jr | LW | Attack |
| 5 | K. De Bruyne | RCM | Attack |
| 15 | Casemiro | CDM | Defence |
| 18 | M. Salah | RW | Attack |
| 20 | J. Kimmich | RDM | Defence |
| 23 | Sergio Ramos | LCB | Midfielder |
| 31 | S. Agüero | ST | Attack |
| 33 | L. Modrić | RCM | Defence |
| 39 | M. Verratti | LCM | Defence |
| 40 | Marquinhos | RCB | Midfielder |
| 47 | Rúben Dias | RCB | Midfielder |
| 48 | G. Chiellini | SUB | Midfielder |
| 53 | Sergio Busquets | CDM | Defence |
| 59 | R. Mahrez | RW | Attack |
| 68 | Rodri | CDM | Defence |
| 75 | E. Cavani | SUB | Attack |
| 97 | M. de Ligt | LCB | Midfielder |
| 99 | Jesús Navas | RB | Defence |
| 100 | Piqué | LCB | Midfielder |
| 133 | L. Sané | LM | Attack |
russian <- (players_info$nationality == "Russia")
cbind(rate = rownames(players[russian,]), name = as.character(players_info$short_name[russian]), position = as.character(players_info$club_position[russian]), mclust = as.character(classes_mclust)[russian]) %>% kbl() %>% kable_paper(full_width = F, "hover") %>%
column_spec(4, color = "white", background = spec_color(as.numeric(classes_mclust[russian]))) %>%
column_spec(2, color = "blue", bold = T, link = players_info$player_url[russian])| rate | name | position | mclust |
|---|---|---|---|
| 221 | Mário Fernandes | RB | Defence |
| 617 | A. Golovin | LF | Defence |
| 759 | R. Zobnin | RDM | Defence |
| 766 | A. Miranchuk | SUB | Attack |
| 1034 | G. Dzhikiya | LCB | Midfielder |
| 1180 | F. Smolov | RS | Attack |
| 1181 | A. Dzagoev | CAM | Defence |
| 1251 | D. Cheryshev | LM | Attack |
| 1325 | A. Miranchuk | SUB | Attack |
| 1561 | A. Kokorin | SUB | Attack |
| 1806 | D. Barinov | RCM | Defence |
| 1890 | A. Sobolev | ST | Attack |
| 2032 | G. Schennikov | SUB | Defence |
| 2283 | Z. Bakaev | SUB | Attack |
| 2306 | R. Zhemaletdinov | RM | Attack |
| 2320 | F. Chalov | ST | Attack |
| 2932 | I. Oblyakov | LB | Attack |
| 3039 | I. Diveev | RCB | Midfielder |
| 3166 | V. Vasin | SUB | Midfielder |
| 3509 | I. Akhmetov | RDM | Defence |
| 3542 | D. Zhivoglyadov | SUB | Defence |
| 3544 | S. Iljutcenko | ST | Attack |
| 3694 | K. Kuchaev | RM | Attack |
| 3877 | F. Kudryashov | SUB | Midfielder |
| 4078 | I. Kutepov | SUB | Midfielder |
| 4215 | R. Mirzov | SUB | Attack |
| 4521 | N. Rasskazov | RB | Midfielder |
| 4554 | S. Magkeev | RCB | Midfielder |
| 4621 | K. Nababkin | SUB | Defence |
| 4624 | A. Eschenko | SUB | Midfielder |
| 4714 | A. Zabolotnyi | SUB | Attack |
| 5338 | D. Kulikov | LCM | Defence |
| 5369 | D. Rybchinskiy | LM | Attack |
| 5370 | N. Umyarov | SUB | Defence |
| 6282 | K. Maradishvili | RES | Defence |
| 6283 | P. Maslov | RES | Midfielder |
| 6403 | A. Silyanov | RB | Midfielder |
| 6539 | E. Bashkirov | RDM | Defence |
| 7437 | M. Mukhin | LDM | Attack |
| 8255 | M. Suleymanov | SUB | Attack |
| 9347 | I. Zhigulev | SUB | Defence |
| 9507 | N. Tiknizyan | RES | Defence |
| 9518 | A. Lomovitskiy | SUB | Attack |
| 10292 | G. Melkadze | SUB | Attack |
| 10540 | I. Shinozuka | RES | Attack |
| 10694 | M. Ignatov | SUB | Attack |
| 10746 | I. Gaponov | RES | Midfielder |
| 11038 | M. Nenakhov | RES | Midfielder |
| 11764 | L. Klassen | LB | Defence |
| 11938 | V. Karpov | RES | Midfielder |
| 13233 | E. Shlyakov | LB | Midfielder |
| 13240 | E. Sevikyan | RES | Attack |
| 14148 | N. Iosifov | RES | Attack |
| 14366 | S. Babkin | SUB | Midfielder |
| 14393 | V. Yakovlev | RES | Attack |
| 16890 | V. Cherny | SUB | Attack |
| 17597 | D. Markitesov | RES | Midfielder |
| 18853 | I. Repyakh | RES | Attack |
plot(1:40, players[1,], "l", col = "red", xlab = "variable", ylab = "points")
lines(1:40, players[2,], col = "red")
lines(1:40, players[1033,], col = "red", lty = 2)
lines(1:40, players[11,], col = "blue")
lines(1:40, players[12,], col = "blue")
lines(1:40, players[3180,], col = "blue", lty = 2)
cor(t(players[c(1,2, 1033, 16,15, 3180),]))## 1 2 1180 23 20 3877
## 1 1.0000000 0.9190866 0.9353127 0.5348853 0.6496237 0.4511132
## 2 0.9190866 1.0000000 0.9659547 0.6823704 0.6801039 0.5521709
## 1180 0.9353127 0.9659547 1.0000000 0.5991011 0.6429120 0.5377683
## 23 0.5348853 0.6823704 0.5991011 1.0000000 0.8646511 0.8626866
## 20 0.6496237 0.6801039 0.6429120 0.8646511 1.0000000 0.8697465
## 3877 0.4511132 0.5521709 0.5377683 0.8626866 0.8697465 1.0000000
## 1 2 1180 23 20
## 2 76.37408
## 1180 96.22370 75.35250
## 23 171.58380 133.16156 149.67966
## 20 150.03999 133.53277 144.99310 80.51708
## 3877 195.78304 169.65848 135.03333 118.25396 121.43723
TODO:: добавить entanglement https://uc-r.github.io/hc_clustering
players_dist <- as.dist(1 - cor(t(players)))
players_hclust <- hclust(players_dist, method="complete")
plot(players_hclust, labels = labs_som, cex = 0.4, main = "Dendrogram (Complete linkage)")classes_hclust <- cutree(players_hclust, k = 3)
classes_hclust <- as.factor(classes_hclust)
levels(classes_hclust) <- c("Attack", "Midfielder", "Defence")
table(classes_hclust)/num_players## classes_hclust
## Attack Midfielder Defence
## 0.3904343 0.4235579 0.1860077
Кластер Defence стал значительно меньше (по сравнению с mclust).
Посмотрим на биплот, убедимся в том, что результат в целом похож на то, что мы видели ранее.
Здесь опять же надо сделать замечание, что кластеры мы делали с помощью другого функционала (минимизировали корреляцию между индивидами), поэтому то, что приведено дальше — не совсем верно.
# Subtract each value from the grand mean and get the number of observations in each cluster.
data.cent <- scale(players, scale=FALSE)
nrows <- table(cutree(players_hclust, k = 3))
TSS <- sum(data.cent^2)
WSS <- sapply(split(players, classes_hclust), function(x) sum(scale(x, scale=FALSE)^2))
BSS <- TSS - sum(WSS)
gmeans <- sapply(split(players, classes_hclust), colMeans)
means <- colMeans(players)
BSS <- sum(colSums((gmeans - means)^2) * nrows)
BSS/TSS## [1] 0.3015825
| rate | name | position | mclust |
|---|---|---|---|
| 1 | L. Messi | RW | Attack |
| 3 | Cristiano Ronaldo | ST | Attack |
| 4 | Neymar Jr | LW | Attack |
| 5 | K. De Bruyne | RCM | Attack |
| 15 | Casemiro | CDM | Defence |
| 18 | M. Salah | RW | Attack |
| 20 | J. Kimmich | RDM | Midfielder |
| 23 | Sergio Ramos | LCB | Defence |
| 31 | S. Agüero | ST | Attack |
| 33 | L. Modrić | RCM | Attack |
| 39 | M. Verratti | LCM | Midfielder |
| 40 | Marquinhos | RCB | Defence |
| 47 | Rúben Dias | RCB | Defence |
| 48 | G. Chiellini | SUB | Defence |
| 53 | Sergio Busquets | CDM | Defence |
| 59 | R. Mahrez | RW | Attack |
| 68 | Rodri | CDM | Defence |
| 75 | E. Cavani | SUB | Attack |
| 97 | M. de Ligt | LCB | Defence |
| 99 | Jesús Navas | RB | Attack |
| 100 | Piqué | LCB | Defence |
| 133 | L. Sané | LM | Attack |
russian <- (players_info$nationality == "Russia")
cbind(rate = rownames(players[russian,]), name = as.character(players_info$short_name[russian]), position = as.character(players_info$club_position[russian]), mclust = as.character(classes_hclust)[russian]) %>% kbl() %>% kable_paper(full_width = F, "hover") %>%
column_spec(4, color = "white", background = spec_color(as.numeric(classes_hclust[russian]))) %>%
column_spec(2, color = "blue", bold = T, link = players_info$player_url[russian])| rate | name | position | mclust |
|---|---|---|---|
| 221 | Mário Fernandes | RB | Midfielder |
| 617 | A. Golovin | LF | Midfielder |
| 759 | R. Zobnin | RDM | Midfielder |
| 766 | A. Miranchuk | SUB | Attack |
| 1034 | G. Dzhikiya | LCB | Midfielder |
| 1180 | F. Smolov | RS | Attack |
| 1181 | A. Dzagoev | CAM | Midfielder |
| 1251 | D. Cheryshev | LM | Attack |
| 1325 | A. Miranchuk | SUB | Attack |
| 1561 | A. Kokorin | SUB | Attack |
| 1806 | D. Barinov | RCM | Defence |
| 1890 | A. Sobolev | ST | Attack |
| 2032 | G. Schennikov | SUB | Midfielder |
| 2283 | Z. Bakaev | SUB | Attack |
| 2306 | R. Zhemaletdinov | RM | Attack |
| 2320 | F. Chalov | ST | Attack |
| 2932 | I. Oblyakov | LB | Attack |
| 3039 | I. Diveev | RCB | Defence |
| 3166 | V. Vasin | SUB | Defence |
| 3509 | I. Akhmetov | RDM | Attack |
| 3542 | D. Zhivoglyadov | SUB | Midfielder |
| 3544 | S. Iljutcenko | ST | Attack |
| 3694 | K. Kuchaev | RM | Midfielder |
| 3877 | F. Kudryashov | SUB | Defence |
| 4078 | I. Kutepov | SUB | Defence |
| 4215 | R. Mirzov | SUB | Midfielder |
| 4521 | N. Rasskazov | RB | Midfielder |
| 4554 | S. Magkeev | RCB | Defence |
| 4621 | K. Nababkin | SUB | Midfielder |
| 4624 | A. Eschenko | SUB | Midfielder |
| 4714 | A. Zabolotnyi | SUB | Attack |
| 5338 | D. Kulikov | LCM | Midfielder |
| 5369 | D. Rybchinskiy | LM | Midfielder |
| 5370 | N. Umyarov | SUB | Midfielder |
| 6282 | K. Maradishvili | RES | Attack |
| 6283 | P. Maslov | RES | Midfielder |
| 6403 | A. Silyanov | RB | Midfielder |
| 6539 | E. Bashkirov | RDM | Midfielder |
| 7437 | M. Mukhin | LDM | Midfielder |
| 8255 | M. Suleymanov | SUB | Attack |
| 9347 | I. Zhigulev | SUB | Midfielder |
| 9507 | N. Tiknizyan | RES | Midfielder |
| 9518 | A. Lomovitskiy | SUB | Attack |
| 10292 | G. Melkadze | SUB | Attack |
| 10540 | I. Shinozuka | RES | Attack |
| 10694 | M. Ignatov | SUB | Attack |
| 10746 | I. Gaponov | RES | Midfielder |
| 11038 | M. Nenakhov | RES | Midfielder |
| 11764 | L. Klassen | LB | Midfielder |
| 11938 | V. Karpov | RES | Midfielder |
| 13233 | E. Shlyakov | LB | Midfielder |
| 13240 | E. Sevikyan | RES | Attack |
| 14148 | N. Iosifov | RES | Attack |
| 14366 | S. Babkin | SUB | Midfielder |
| 14393 | V. Yakovlev | RES | Attack |
| 16890 | V. Cherny | SUB | Attack |
| 17597 | D. Markitesov | RES | Midfielder |
| 18853 | I. Repyakh | RES | Midfielder |
set.seed(28)
num_of_clust <- 3
players_kmeans <- kmeans(players, num_of_clust)
classes_kmeans <- players_kmeans$clusterЧисло BSS/TSS:
## [1] 0.4510039
##TODO
cyprus <- (players_info$nationality == "Cyprus")
f1 <- which(cyprus)
f2 <- which(russian)
factorr <- rep("", num_players)
factorr[f1] <- "Cyprus"
factorr[f2] <- "Russia"
factorr <- as.factor(factorr)
fviz_pca_biplot(res,
label = "ind",
col.ind = factorr,
legend.title = "Players from Russia and Cyprus",
select.ind = list(name = rownames(players_info[russian | cyprus,]))
)